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Abstract 

The processes, resulting in the transcription of RNA, are intrinsically noisy. It was observed 
experimentally that the synthesis of mRNA molecules is driven by short, burst-like, events. An 
accurate prediction of the protein level often requires one to take these fluctuations into account. 
Here, we consider the stochastic model of gene expression regulated by small RNAs. Small RNA 
post-transcriptional regulation is achieved by base-pairing with mRNA. We show that in a strong 
pairing limit the mRNA steady state distribution can be determined exactly. The obtained an- 
alytical results reveal large discrepancies in the deterministic description, due to transcriptional 
bursting. The proposed approach is expected to work for a large variety of multi-species models. 



PACS numbers: 



Models describing nonlinear interactions between several species of particles are widely 
spread in many areas of science. Let us mention a couple of the famous examples: the 
Predator-Prey (PP) model in biology and the Susceptible- Infected-Removed (SIR) epidemic 
model. Quantitative description of gene regulation by small RNA also requires the analysis 
of a nonlinear two-species interacting system. 

Small noncoding RNAs (sRNAs) are generally synthesized as independent transcripts 
that interact with mRNA. The functional role of many of those sRNAs is the inhibition 
of translation by base pairing with the mRNA in the ribosome-binding site, for review see 
l|, l2|]. Hence, the sRNA and mRNA two-species recombination process has to be integrated 
in the gene expression model. 

The models mentioned above are inherently random. Stochasticity usually arises due to 
a low abundance of the interacting species, since they are present in discreet numbers. As 
a result, the deterministic models often fail to correctly predict the characteristics of the 
system for a certain range of parameters. 

In order to account for noise due to the discreetness of the particles, the dynamical 
rules of evolution are interpreted as probabilistic processes. In many instances, transition 
probabilities, governed by these dynamical rules, are nonlinear in terms of the number of 
particles involved. 

However, in the modeling of the multi-species systems the only source of nonlinearity 
is often the interaction between different species. Interactions within the same species are 
usually modeled by the linear birth-death processes which are sufficient to capture the 
system's dynamics. In order to model inter-species interaction, one is bound to introduce 
nonlinear kinetic terms. 

Even single site models, that describe nonlinear, interacting multi-species particles, are 
difficult to treat analytically. An exact solution can be usually derived only for models with 
special symmetries (conservation laws). 

If the symmetries between different species are broken by either dynamical rules or the 
diffusion for spatially extended problems, then there is little hope to achieve an exact solution 
for a multi-species nonlinear model. 

Nevertheless, several powerful approximation schemes were developed to treat various 
aspects of multi-species problems analytically. They usually rely upon the separation of 
time-scales in the model This separation sometimes allows one to average out 
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one random variable and reduce the system into an effectively single species problem. The 
reduced problem, however, usually remains nonlinear and further approximations may be 
required in order to solve it. 

We propose a similar approach based on the time-scale separation, that leads to a linear, 
coupled multi-species model. Based on several readily solvable examples, we believe that 
our method can be proved useful in dealing with a large variety of multi-species systems. 
The main advantages of the proposed method are symmetry preservation and linearity of 
the effective evolution equations. 

In many biological and epidemic models, the coupling 7 controlling the interaction 
strength is large in comparison with the rest of the kinetic parameters in the system. For 
example, the capture of prey by the predator happens on a much shorter time-scale than 
the birth and death of both species. 

Exactly the same argument is applicable to some epidemic models. The typical contrac- 
tion time is fairly short, compared to the time-scale of the recover process. Similarly, in the 
model of gene regulation by sRNA recombination rate is often large. 

Surprisingly, strong interaction may lead to the simplification of the model! Indeed, in 
the limit of the infinitely large interacting coupling 7-^00, only one species may be present 
at a time. For the PP model, this means that at any site of the lattice either predators or 
prey are present, not both at the same time, since predation is instantaneous. 

Since in many models the only source of nonlinearity is the inter-species interaction 
term, the limit 7 — > 00 results in set of linear coupled evolution equations for each species. 
Moreover, this limit preserves all the symmetries of the model at hand. Hence, even for 
models with weak inter-species interaction, the obtained equations can be used to identify, 
say, critical exponents of the system. 

To demonstrate how an exact solution may be achieved in the strong interaction limit, 
we consider the very interesting problem of post-translational gene regulation. Unlike PP 
and SIR models, noninteracting terms can be, in this case, rather complicated due to tran- 
scriptional bursting. Nevertheless, the sRNA regulation model can be solved exactly in the 
strong interaction limit. 

The model of sRNA regulation is depicted in Fig. [1] and the deterministic (mean-field) 
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FIG. 1: The model of sRNA regulation. Both mRNA and sRNA molecules are synthesized inde- 
pendently with arrival rates k m and k s correspondingly. They decay either naturally or as a result 
of mutual degradation with rate 7. 



evolution equations for the corresponding RNA levels are 



IT] 



d t m = k m -T m 1 m-'yms, (1) 
d t s = k s — r^s — 7ms. (2) 

Here kj and r" 1 are corresponding arrival and decay rates for species j = {m, s}. 

In terms of the Master equation, these rates can be interpreted as probabilities per unit 
time of the synthesis and decay events. 

A pair of molecules of mRNA and sRNA can recombine into an inert complex that quickly 
decays with the probability rate 7 |7]. In what follows, we assume that the rate 7 is fast 
compared to the synthesis and natural decay rates of both species mRNA and sRNA. 

If the rate 7 is infinite, it is clear that, at any given time, at most one species of particles 
is present. Therefore, a probability to find m mRNA and n sRNA molecules at any time t 
is the sum 

V(m, n; t) = 5{n)V m {m] t) + 5(m)V s (n; t), (3) 

where 5(ri) is the Kronecker delta function, which has the value 1 for n — 0, and zero 
otherwise. Here, V m and V s are the reduced probability distribution for mRNA and sRNA 
molecules. 

Let us now derive a detailed balance condition for the reduced mRNA probability distri- 
bution, which is essentially a Master equation at the steady state. To this end, we assume 
that RNA synthesis for both species occurs in bursts. 

Indeed, in recent exciting experiments, several groups were able to probe the statistics of 



various cellular components for a single DNA molecule 



ay, 
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instances that the synthesis of mRNA molecules is driven by short, burst-like, events [8j. 
For n > 1, the detailed balance condition reads 

n oo 

k m ^2 pm(i)'Pm(n - i) + k m ^2 p m (i)V a (i - n) 

i=l i=n 

oo 

+T~\n + l)V m {n + 1) + k s Ps(i)V m {n + i) 

i=i 

= [fc m + fc s + ^ 1 n]P m (n). (4) 

Here, p m (z) is the conditional probability distribution to find i mRNA molecules in the burst, 
provided that the burst is not empty. The analogous quantity, p s (i), characterizes the sRNA 
burst distribution. 

The mRNA detailed balance condition Eq. (jl]), and its symmetric counterpart for sRNA, 
fully describe the steady state distribution of the system, once the burst statistics p(n) are 
given for each species. 

In several experiments, various microscopic parameters of the gene expression model were 
obtained, such as, the mean burst frequency and the mean number of mRNAs in the burst 
(transcript number) 8|,|9|]. In bacteria, the number of mRNA molecules detected in synthesis 
bursts was governed by a geometric probability distribution 8] 

Pm(n) = (1 - U m ) U^ 1 , 71=1,2,... (5) 

It is, then, reasonable to expect that sRNA burst statistics p s is also given by the geo- 
metric distribution with a different control parameter, u s . 

In order to solve Eq. (j4j, it is convenient to define burst and steady state generating 
functions 

9j ( x ) = ^ Pj (n) = £^£, (6) 

71=0 J 
oo 

(7) 

n=0 

for each species j = {m, s}. Note that, due to the definition Eq. (J7J), the generating functions 
are analytic in the interval x G [—1,1], since < Vj(n) < 1 for any integer n. 

The average quantities can now be derived by evaluating the derivatives of these gener- 
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ating functions at point x = 1, e.g., 

oo ^ 

n m = ^np m (n) = d x g m {\) = — , (8) 

n=i 1 Um 

oo 

(m) r = ^2nV m (n) = d x G m (l), (9) 

n=0 

where n m and (m) r are the mRNA transcript number and regulated mean level correspond- 
ingly. 

For geometric burst distributions we derive from Eq. (jlj) 

G m (x) + G s (u m ) 



dxG m [x) T m k 



m -. 

1 VjyyjX 



III 

, G m (xj G m (u s ) 
r m k s . (10) 

X — 11. 



The symmetric equation, with interchanged indices m and s, holds for the sRNA steady 
state generating function. 

The point x = u s corresponds to blow-up in the normalized homogeneous solution of 
Eq.([IOD 

i_uA * ■ f - 

x-u s J \l-u m xj 

The analyticity of functions G m (x) and G s (x) on the entire interval {—1,1} means that the 
following condition should be satisfied in order to remove this singularity: 

G m (l) = 4K, l)G m (u s ) + Il(u si l)G s (u m ). (12) 

Here we defined the finite integrals for particular solution representation: 

Il(a ' b)s S/ ix (xX)h m (,Y (13) 

fjfl,b)= fix ,, *" T " , r . (14) 



(1 u m x^jh m (^x 

The integrals above can be, in principle, written in terms of hypergeometric function, but 
it does not really help to understand their properties. One can show, however, that these 
integrals are simply related: 

4M) - 4M) = h m \a) - h m \b). (15) 
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The resulting analytic generating function of the mRNA steady state distribution is given 



by 



G m (x)=h m (x) [l^(u s , x)G m (u s )+I^(u s , x)G s (u m )] . 



(16) 



The probability to find an empty system in terms of the reduced distributions can be 
interpreted as either V m (0) or V s (0) or a combination of both. Due to the symmetry of the 
representation P m (0) = V s (0) = ^V(0, 0), and we obtain from Eq. ffTBT) 



MO) [4(0, u s )G m (u s ) + ll(0, u s )G s (u m )] 
h s (0) [ll(0,u m )G 8 (u m ) +I 2 s (0,u m )G m (u s 



(17) 



Here, the definitions of sRNA's functions ll(a,b), Ig(a,b) and h s (x) are symmetric with 
respect to the permutation of indices of the corresponding mRNA's functions. 
Finally, one derives 



G m (l) + G s (l) = J2 V rn(m) + J2 V s(n) = 1, 



(18) 



which is the conservation of probability condition. 

Combining all conditions together: Eq. (1121) . the symmetric equation for sRNA, Eq. ( fTTT) . 
and Eq. ( fTBT) . we obtain a set of four linear algebraic equations for four unknowns: 
G m (l),G m (u s ) and G s (l),G s (u m ). 

The generating functions G m (x) and G s (x) immediately follow from Eq. ffTBT) and the 
symmetric equation for sRNA. These generation functions can be used to study various 
aspects of sRNA regulation. 

A particularly interesting quantity is the mean regulated leve 
protein level is linearly proportional to the one for mRNA [8 
proportionality is the same for regulated and unregulated systems. Hence, one can calculate 
the repression in protein expression due to sRNA regulation by comparing regulated and 
unregulated mRNA levels. 

The regulated mRNAs average can be obtain by setting x = 1 in Eq. ( fTOl) : 



of mRNA. Indeed, the mean 
ll^ . The coefficient of the 



(m) r = (m) 



2^s ( M m, 1) 



i + ri(u s ,i) + ii(u m A)\ 



a 



2^m(^s, 1) 



l+i*K,l)+i?(«m,l) 



(19) 
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where (m) = n m T m k m and (s) = n s r s k s are unregulated RNA levels. Here, a = r m /r s is a 
ratio of the mRNA and sRNA lifetimes. The symmetric equation is derived for sRNA level 
(s) r by permutation of indices m and s. 

The ratio of the mRNA and sRNA lifetimes satisfy the following condition 

(m) - (m) r 

a = IsTW m 

Interestingly enough, this condition is extremely robust. It holds for finite values of the 
coupling rate 7. Moreover, the condition Eq. (1201) is satisfied for any burst distribution and 



even for the arbitrary arrival time statistics of the RNAs bursts ll|! This robustness is 
a result of the symmetry between mRNA and sRNA, since recombination is mutual and 
involves the same amount of molecules of each species. 

One of the interesting properties of the regulated mean mRNA level, Eq. (TiTJj) . is the 
following. Let us define repression level R ratio 

(in)r , \ 

R=Yf. 21 
\m) 

For the same values of unregulated levels of mRNA and sRNA, the maximal repression is 
achieved for a single transcript at a time, n m = 1 for any value of the parameter n s . This 
is not surprising, since this limit corresponds to the maximal burst frequency of mRNA. 
Moreover, decreasing the sRNA transcript number increases repression for the same reason. 
The transcript numbers n m — n s — 1 correspond to the absolute minimum of the ratio R. 

An important question is how the deterministic (mean-field) prediction compares with 
the stochastic one. From the deterministic evolution equation Eq. (TfJ one derives the steady 
state regulated level in 7^00 limit 

m r = (m) - a(s), (s) < s c , (22) 
m T = 0, (s) > s c , (23) 

where m r and s r are regulated mean-field levels for each species. Therefore, according to the 
mean-field prediction above, the mRNA steady state level vanishes once the critical value 
of the unregulated sRNA level s c = m/a is reached. 

This mean-field threshold-like solution 0, [7] is, in fact, a crude lower bound for the 
stochastic average Eq. ffl9|) ; see Fig. [21 
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FIG. 2: The protein repression level R is shown as a function of mRNA transcript number n m . 
The RNAs' unregulated levels are fixed (m) = 2(s) = 1 in both graphs, and the insets correspond 
to higher unregulated levels (m) = 2(s) = 10. The RNAs' lifetime ratio is sampled at values 
a = 0.5 (above critical) and 4 (below critical) for left to right graphs accordingly. The solid blue 
curve corresponds to sRNA transcript number n s = 1, dot dashed orange and dashed green curves 
correspond to n s = 5 and 40 accordingly. Finally, horizontal dashed line represents the mean-field 
solution Eq. (1221) . 



The average repression level in protein expression shows significant deviation from the 
mean-field prediction. This deviation becomes larger with an increase in transcriptional 
bursting. In particularly, the threshold s c in protein expression, predicted by the deter- 
ministic approach, can be much smaller due to transcriptional noise. This behavior was 
anticipated by Mehta et al., 6|. 

The discrepancy in deterministic description is particularly apparent for small unregu- 
lated RNAs' levels. The mean-field solution Eq. ( |22l) scales linearly with unregulated RNAs' 
levels and hence, incorrectly predicts the same repression regardless of RNAs abundance. 

The derived generating function can be used to further analyze different aspects of reg- 
ulation. Quantities, such as noise strength, Fano factor, rare events probabilities, etc., can 
be derived exactly from the result Eq. (TT6l) . 

The strong interaction limit may lead to the significant simplification of the spatially 
extended model models as well. Consider Prey-Predator model on a lattice in this limit (a 
very similar description holds for the SIR model as well). Any lattice site has now only one 
residing species at a time. 

All local processes (birth and death) occur with transition rates that are linear with 
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respect to the number of particles present at each site of the lattice. The modified diffusion 
rules can be summarized as follows: 

The diffusion of prey from site i to site j always results in an increase in the population 
of the species residing at site j by one. It does not matter what species was present at site j 
before the jump. The diffusion of predators from site % to site j always results in an increase 
in the population of the predators on site j by one, with respect to the previous occupancy 
of any species. 

The transition rates due to these modified diffusion rules are linear as well. The linearity 
of the all transition rates can be utilized in order to treat models analytically. Therefore, we 
hope that even for spatially extended models, the strong interaction limit may ultimately 
lead to an exact solution. 
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